0%

(2017) Obfuscated Gradients Give a False Sense of Security:Circumventing Defenses to Adversarial Examples

Athalye A, Carlini N, Wagner D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples[J]. arXiv preprint arXiv:1802.00420, 2018.



1. Overview


In this paper

  • identify three types of obfuscated gradient (shattered gradient, stochastic gradient, vanishing/exploding gradients)
  • propose Backward Pass Differentiable Approximation (BPDA) to overcome obfuscated gradient

1.1. Dataset

  • MNIST&CIFAR-10. untargeted
  • ImageNet. 1000 randomly selected, targeted
  • attacker-targeted, defender-untargeted

1.2. Network

  • MNIST. 5 Conv
  • CIFAR-10. ResNet
  • ImageNet. InceptionV3

1.3. Attacker

  • white-box (but not test-time randomness)

1.4. Obfuscated Gradient

  • Shattered Gradient. non-differentiable, nonexistent, incorrect
  • Stochastic Gradient. randomized
  • Exploding&Vanishing Gradient. multiple iteration



2. Attack Methods


2.1. Shattered Gradient

2.1.1. Simple

  • preprocessor g() satisfy g(x)ā‰ˆx.


2.1.2. BPDA

  • find a differentiable approximation g() such that



  • (f_i: non-differentiable layer)

  • forward. through f_i(x)
  • backward. replacing f_i(x) with g(x)

2.2. Stochastic Gradient

  • apply Expectation over Transformation (EOT)


2.3. Exploding&Vanishing Gradient

2.3.1. Reparameterization

  • For f(g(x)), g performs optimation loop

    • make a change-of-variable


  • find differentiable h





3. Experiments


3.1. Adversarial Training

  • has been shown to be difficulty at ImageNet scale. Adversarial Machine Learning at Scale
  • training exclusively on lāˆž adversarial examples provides only limited robustness to adversarial examples under other distortion metrics. Attacking the madry de-fense model with L1-based adversarial examples

3.2. Shattered Gradient

  • thermometer encoding. BPDA-backward
  • cropping&rescaling. EOT
  • bit-depth. BPDA-identity
  • JPEG. BPDA-identity
  • TVM. EOT+BPDA
  • Quilting. EOT+BPDA

3.3. Stochastic Gradient

  • SAP (random dropout at each layer). EOT

3.4. Vanishing&Exploding Gradient

  • PixelDefend
  • Defense-GAN

3.5. Results